Realizing a Rasch measurement through instructionally- sequenced domains of test items

نویسندگان

  • Matthew Schulz
  • E Matthew Schulz
چکیده

This paper presents results from a project in which instructionally-sequenced domains were defined for purposes of constructing measures that that conform to an ideal in Guttman scaling and Rasch measurement. A fundamental idea in these measurement systems is that every person higher on the measurement scale can do everything that lower-level persons can do, plus at least one more thing. This idea has had limited application in educational measurement due to the stochastic nature of item response data and the sheer number of items needed to obtain reliable measures. However, it has been shown by Schulz, Lee, and Mullen [1} that this ideal can be can be realized at a higher level of abstraction -when items within a content strand are aggregated into a small number of domains that are ordered in instructional timing and difficulty. The present paper shows how this was done, and the results, in an achievement level setting project for the 2007 Grade 12 NAEP Economics Assessment. 1. Conceptual Background Cliff remarked that a Guttman scale is one of the best examples of a good idea in all of psychometric measurement [1]. A Guttman scale typically consists of a relatively small number of ordered tasks or observational trials where each task represents a level of proficiency [2]. Persons in the population to which the scale applies can generally be expected to have mastery of tasks up to and including their assigned level, and non-mastery of tasks representing higher levels. The universal ordering of task difficulty for all persons allows one to predict a person’s mastery of each of the levels defining the scale, solely from the person’s assigned level. Moreover, from observing a person’s performance on just one level, one can predict performance on any lower level (if the performance was successful) or on any higher level (if the performance was not successful). Using an example of a four-item test, Andrich [3] showed that the Rasch model [4] is a probabilistic version of a Guttman scale. In the Rasch model, Guttman levels correspond to points on the latent proficiency scale, and there are as many levels as there are binary items or rating scale categories in the assessment (as in traditional Guttman scaling). The Rasch model provides a probability for success as a function of the difference between the measure of examinee ability and the measure of task, or level, difficulty. Mastery of a level can be viewed as a matter of degree, quantified by the probability of success, rather than as an all-or-none phenomenon. Importantly, however, Andrich showed that in order for items to be Guttman-scalable their item characteristic curves on the latent proficiency scale must not cross. That is, the ordering of levels must be the same at all levels of ability and for all probabilities of success. 1 To whom any correspondence should be addressed. IMEKO2016 TC1-TC7-TC13 IOP Publishing Journal of Physics: Conference Series 772 (2016) 012061 doi:10.1088/1742-6596/772/1/012061 Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by IOP Publishing Ltd 1 Schulz, Kolen & Nicewander [5] noted two characteristic features of educational tests, besides the stochastic nature of test items pointed out by Andrich [3], that has made it difficult to put Guttman scaling ideals into practice in education. One is the sheer number of items on educational tests. Guttman scales typically consist of 4 to 7 levels, each represented by a single task that can be observed and scored with near-perfect reliability. The distance between Guttman levels on a Rasch scale would be very large and there would be little chance that their characteristic curves would overlap. Large numbers of items are required for educational testing because their reliability is so low. Items are so closely spaced on the measurement scale that their characteristic curves do tend to cross. The sheer number of crossings, however, discourages educators from attaching a great deal of meaning to this particular violation of a measurement ideal. The other way that educational test items differ from Guttman tasks is in that they are universally regarded as exchangeable, random sampling units of larger domains. A general area of skill, such as mathematics, or even a specific skill such as working with fractions, may be represented by hundreds, if not thousands, of test items. Virtually any skill that is a target of assessment and general inference is assessable with multiple, exchangeable test items, at least in theory. The tasks or observations comprising a Guttman tasks are typically treated as essential components of the scale. It is not common practice to exchange one task for another or for there to be two separate versions, or forms, of a Guttman assessment, each consisting of a different set of tasks measuring the same thing. Recognizing test items as exchangeable sampling units of a broader domain of skill, Schulz, et al., [5] argued that item parameters such as the difficulty and discrimination parameters in item-response theory (IRT) models should be treated as random variables underlying similar parameters of domains. In subsequent studies, Schulz, et al., [6, 7, 8, 9] developed and applied a technique for defining a relatively small number of instructionally-relevant, difficulty-ordered domains within the broader domains of educational achievement tests including the National Assessments of Educational Progress (NAEP) in Grades 8 and 12 mathematics. Expected percent correct scores on the domains supported the notion that the domains had the same order of difficulty for all persons and that higher-level persons could do what lower-level persons could do, plus at least one more thing were realized at the level of percent correct scores on domains. Item parameters were considered only to the extent that they were expected to be normally distributed around the domain parameters with which they were associated and outlier status in this regard could cause items to lose their association with a domain. In this paper, the domain definition process developed by Schulz et al., [7] is illustrated in a subject area, economics, that is not generally regarded as containing naturally-ordered progressions of skill. In mathematics, skills such as addition, subtraction, multiplication, and division seem to have a natural ordered in difficulty. In a subject such as economics, the prospect of defining difficulty-ordered domains based on instructional sequence is less clear. The domain development work in economics conducted by ACT for the NAEP economics standard setting project [10] has not been reported in detail previously except in technical reports and presentations delivered by the contractor to its technical advisory committee and to the committee on standards, design and methodology of the National Assessment Governing Board (NAGB). 2. Application The achievement level setting project for the 2007 NAEP in grade 12 economics included a process of defining difficulty-ordered domains within the grade 12 economics assessment. The domains were intended to support the same Guttman-scale relationships as domains previously developed using items in the Grade 8 and Grade 12 NAEP mathematics assessments [6,7,8,9]. A key step in the domain-development process used in those studies was is to quantify the instructional timing of the test items. In a mathematics assessment covering primary grades, or even high school mathematics, instructional timing can be associated with grade levels or courses in a standard high school sequence. In economics, the rating scale shown in Figure 1 was used. Five curriculum and content experts used the rating scale to rate the NAEP Grade 12 economics items. An average rating was computed for each item. IMEKO2016 TC1-TC7-TC13 IOP Publishing Journal of Physics: Conference Series 772 (2016) 012061 doi:10.1088/1742-6596/772/1/012061

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implicational Scaling of Reading Comprehension Construct: Is it Deterministic or Probabilistic?

In English as a Second Language Teaching and Testing situations, it is common to infer about learners’ reading ability based on his or her total score on a reading test. This assumes the unidimensional and reproducible nature of reading items. However, few researches have been conducted to probe the issue through psychometric analyses. In the present study, the IELTS exemplar module C (1994) wa...

متن کامل

Psychometric properties of Geriatric Depression Scale (GDS) among elderlies in Tehran using multidimensional Rasch model

Introduction and purpose: The purpose of this study was to examine the psychometric properties of the Geriatric Depression Scale (GDS) when applied to the elderly of Tehran. This research is applied-developmental, descriptive and quantitative. Materials and Methods: The research population was Tehrani elderlies, among which 400 people responded to the Geriatric Depression Scale voluntarily and...

متن کامل

An Alternative Way of Establishing Measurement in Marketing Research – Its Implications for Scale Development and Validity

Abstract Quantitative consumer and marketing research is looking back on an era of construct operationalization predominantly based on classical test theory as a technical framework of scale development. Rasch measurement theory provides an alternative framework of measurement. Previous studies demonstrated the potential of Rasch measurement for marketing research from a theoretical viewpoint, ...

متن کامل

Examining Psychometric and Measurement Properties of the Career Thoughts Inventory: Demonstration and Use of the Rasch Measurement Model in Career Assessment Research

The Rasch measurement model for developing and revising career assessment tools has many advantages over traditional test development methods. To better understand this method, the current study met its two purposes (a) to illustrate how the Rasch measurement model can aid vocational psychology researchers in increased precision and accuracy in assessment; and (b) to examine the psychometric an...

متن کامل

Measurement properties of the osteoarthritis of knee and hip quality of life OAKHQOL questionnaire: an item response theory analysis.

OBJECTIVE To further document the measurement properties of each domain of the OA of knee and hip quality of life (OAKHQOL) questionnaire by a Rasch analysis. METHODS The OAKHQOL self-administered questionnaire has been developed to assess health-related quality of life in lower limb OA. Patients with various degrees of severity of knee or hip OA answered the questionnaire. For each domain, t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016